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THE  PROBLEM 


To  look  at  the  variance  in  intelligibility  among  talkers  when 
noise  is  mixed  with  the  signal  at  the  listener's  ear. 

FINDINGS 

With  individual  voices  equated  for  intensity,  a  range  of  28%  in 
words  correct  was  found  for  normal-hearing  listeners.  A  mini¬ 
mum  of  five  voices  was  needed  properly  to  sample  the  variance 
among  talkers.  Acoustic  and/or  perceptual  qualities  which 
rendered  certain  voices  more  intelligible  were  discussed  but  not 
quantified  in  this  report. 

APPLICATION 

For  communications  engineers  designing  circuit  tests  using 
specific  word  lists  and  specific  talkers. 
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ABSTRACT 


In  an  effort  to  determine  what  acoustic  differences  render  one 
voice  more  intelligible  than  others,  especially  in  the  presence  of 
noise,  or  reverberation,  or  when  breathing  helium  gas  mixtures, 
experiments  were  conducted  using  ten  young  adult  talkers. 

The  work  of  numerous  other  researchers  who  have  worked 
with  this  problem  is  also  reviewed  and  discussed  (32  references). 

It  is  concluded  that  it  may  not  be  good  practice  to  depend 
upon  a  single  voice  in  assessing  the  performance  of  a  communica¬ 
tion  system.  Even  when  each  of  ten  talkers  with  dialect-free 
speech  were  equated  for  acoustic  intensity,  when  speaking  at  a 
comfortable  level,  one  talker  yielded  performance  of  75.5%  words 
correct,  another  only  47.5%.  It  would  be  necessary  to  take  the 
average  intelligibility  of  any  five  of  these  talkers  in  order  to 
properly  assess  a  communication  system. 

More  research  needs  to  be  done,  to  study  the  acoustic  and 
perceptual  cues  which  render  a  particular  voice  highly  intelligible 
in  the  presence  of  noise. 
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VARIABILITY  AMONG  TEN  TALKERS  OF  WORD  INTELLIGIBILITY  IN  NOISE 


INTRODUCTION 

The  speech  of  some  talkers  is  more 
intelligible  than  others.  The  important 
questions  involved  are  (1)  what  are  the 
acoustic  differences  which  render  one 
voice  more  intelligible  than  others,  and 
(2)  what  are  the  relations  among 
talkers’  intelligibility  not  only  in  quiet 
but  also  in  noise,  in  reverberation,  when 
peak-clipped,  when  breathing  helium, 
etc.  If  certain  voices  could  be  shown  to 
be  especially  resistant  to  all  types  of 
distortion,  and  their  invariant  acoustic 
characteristics  easily  identified,  one 
could  select  especially  suitable  (and 
reject  unsuitable)  voices  by  means  of  a 
quick  acoustic  analysis  of  the  voice, 
and  possibly,  if  the  dominant  charac¬ 
teristics  were  open  to  modification,  all 
or  most  talkers  could  be  trained  to 
meet  certain  standards  of  communica¬ 
tion  ability. 

Variance  Among  Talkers  of  Intelligi¬ 
bility  in  Quiet 

In  some  intelligibility  studies  con¬ 
ducted  by  W.  B.  Snow  and  A.  Meyer  at 
the  Bell  Telephone  Laboratory  (see 
Fletcher  and  Galt*),  the  variance  of 
level  of  voice  for  four  men  and  four 
women  was  found  to  be  as  great  as  16 
dB.  Certainly  these  differences  con¬ 
tribute  profoundly  to  differences  in  in¬ 
telligibility.  Fletcher  and  Steinberg2 
in  an  even  more  extensive  study  also 
found  that  the  normal  talking  level  of 
persons  varied  rather  widely,  as  ex¬ 
pected.  But  when  these  differences 
were  compensated  for  by  having  all 
talkers  use  a  Volume  Indicator  while 


speaking,  differences  among  talkers 
remained  such  that  a  minimum  of  5  dif¬ 
ferent  normal  voices,  each  using  good 
clear  standard  general  American  dialect, 
were  needed  to  establish  mean  intelligi¬ 
bility  for  such  voices.  They  found  that 
when  voice  level  was  equated,  21  men 
were,  on  the  average,  a  bit  more  in¬ 
telligible  for  nonsense  syllables  in 
quiet  than  23  women.  The  classic 
study  of  Dunn  and  White ^  showed  that 
the  average  speech  spectra  for  6  men 
was  a  bit  stronger  at  300-3000  Hz  than 
for  5  women.  Benson  and  Hirsh's4 
study  reported  overall  levels  about  10 
dB  lower.  Differences  between  these 
studies  can  only  be  attributed  to  talker 
variances. 

In  their  fundamental  paper  on 
speech  intelligibility,  Fletcher  and 
Galt 1  showed  that  the  general  talking 
level  (re  watt/cm^)  of  95%  of 

their  talkers  varied  from  55-75  dB 
around  an  average  of  68  dB.  Thus,  on 
speech  intensity  alone,  a  wide  range  of 
intelligibility  among  talkers  could  be 
expected.  Fletcher  and  Galt  also  at¬ 
tempted  to  quantify  two  acoustic  contri¬ 
butions  of  masking  to  intelligibility,  the 
residual  masking  of  one  speech  sound 
upon  another  later  sound,  and  the  mask¬ 
ing  of  one  sound  on  a  simultaneous 
sound  in  a  different  region.  Presum¬ 
ably  both  types  of  masking  will  vary 
among  voices  with  differences  in  the 
frequency  spectrum  and  temporal 
pattern. 

Although,  just  as  for  isolated 
vowels,  the  relationships  or  differences 
in  frequency  among  formants  is 
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relatively  constant  among  talkers  or  for 
different  voice  pitches,  so  also  isolated 
sustained  consonants  reveal  formants 
which  bear  certain  relatively  constant 
relationships  to  each  other.  Note  how¬ 
ever  that  in  conversation  the  acoustic 
structure  of  the  consonant  is  modified 
by  adjacent  speech  sounds.  In  fact, 

C.  M.  Harris  5  found  that  removal  of 
these  transitions  which  characterize  a 
particular  voice,  rendered  it  much  less 
intelligible.  The  problem  is  therefore 
acoustically  quite  complex. 

Some  attempts  have  been  made  to 
determine  just  what  characterizes  an 
intelligible  voice  in  quiet.  Moser  et 
al6  found  that  intelligibility  of  six  nor¬ 
mal  male  talkers  varied  as  a  function  of 
induced  hypernasality  or  hyponasality, 
significantly  much  more  so  for  some 
talkers  than  for  others.  They  stated, 
"Hypernasality  might  be  experienced  as 
a  result  of  foreign  language  influence 
or  insufficient  closureof  the  naso¬ 
pharynx;  it  would  be  expected  as  a  con¬ 
tinuous,  albeit  perhaps  relatively  rare, 
condition  associated  with  a  particular 
speaker.  On  the  other  hand,  a  hypo- 
nasal  condition  caused  by  the  blockage 
of  the  nasal  passage  might  occur  at 
frequent  intervals  for  any  speaker.  " 

For  normal  speech  in  quiet  we  may 
conclude  that  talker  differences  create 
ranges  of  about  20  dB  in  overall  level, 
but  even  when  this  feature  is  controlled, 
about  5  voices  are  needed  to  sample  the 
remaining  differences  in  intelligibility 
among  good  clear  talkers. 

Variance  Among  Talkers  of  Intelligi¬ 
bility  in  Noise. 

During  World  War  n  interest  was 
high  in  predicting  and  controlling 


intelligibility  not  in  quiet,  but  in  high- 
level  military  noise.  It  could  not  be 
supposed  a  priori  that  those  qualities 
which  rendered  a  voice  intelligible  for 
nonsense  syllables  in  quiet  also  allowed 
it  to  cut  through  all  noises. 

Abrams  et  aP*  found  intelligibility 
scores  of  twelve  normal-talking  young 
men  to  range  from  44  to  85  per  cent 
correct  working  against  airplane  noise. 
Abrams  et  al5  studied  the  performance 
of  28  male  college  students,  10  fe¬ 
male  talkers  (secretaries)  with  normal 
speech  and  28  untrained  enlisted  men, 
against  background  noise.  The  sailors 
yielded  standard  deviations  of  about  9 
per  cent  intelligibility  with  listener 
panels  -  with  one  word  list,  for  example, 
mean  intelligibility  was  66%,  individual 
talkers  ranging  (12  SD)  from  47.2  to 
84.4%.  For  22  men  vs  10  women, 
mean  scores  were  equal  (47  ,  48%), 
with  mean  readings  on  the  VU  also  the 
same  (-3.8,  -3.7).  Fletcher  and 
Steinberg2  showed  21  men  to  be  on  the 
average  a  bit  more  intelligible  than  23 
women,  for  nonsense  syllables  in  quiet. 
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Abrams  et  al  studied  the  intelli¬ 
gibility  in  noise  of  the  speech  of  270 
college  freshman.  Their  speech  pro¬ 
fessor  identified  14  men  with  the  most 
especially  faulty  or  undesirable  speech 
traits.  The  following  quote  from 
reference  9  shows  that  these  were  in 
fact  quite  faulty. 

"The  14  members  of  Group  P 
showed  no  obvious  speech  faults. 
Reference  to  Professor  Packard's 
interview  cards  for  members  of  Group 
F  showed  that  they  had  been  selected 
for  the  following  speech  faults: 

1.  DB  —  Sloppy  diction;  poor 
clarity;  unaccented 
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syllables;  inaudible; 
lack  of  confidence;  pro¬ 
nounced  regional 
(Michigan)  dialect. 

2.  RC  —  Nasility;  imperfect 

enunciation:  "These" 
"Dese",  etc. ,  very 
hesitant;  strong  racial 
( Fr  enc  h-  C  anadian ) 
dialect. 

3.  HD  —  Sloppy  and  careless 

diction;  lack  of  confi¬ 
dence;  high  immature 
nasal  voice;  lack  of 
force  and  precision; 
rate  too  fast  and  ir¬ 
regular. 


8.  HH  —  Growling  tone;  very 

hesitant  rhythm;  poor 
L. 

9.  JK  —  Sloppy  diction;  flat 

nasal  "A";  hesitant; 
faulty  pronunciation 
("excape",  "goin"); 
bad  reader. 

10.  RL  —  Sloppy,  jumbled, 

elided  diction;  ex¬ 
treme  nasility;  hesi¬ 
tant;  defective  reader. 

11.  PP  —  Nasal;  thin  vowels; 

hesitant;  faulty  pro¬ 
nunciation  ("excape", 
"goin");  bad  reader. 


4.  GG  —  Sloppiest  diction  im- 

maginable;  motionless 
lips  and  jaw;  bad  dental 
consonants;  too  rapid 
rate;  no  stress;  bad  re¬ 
gional  (South  Carolina) 
dialect. 

5.  PG  —  Sloppy  diction;  poor 

clarity;  lack  of  earnest¬ 
ness;  slow,  monotonous 
rate. 

6.  PH  —  High  pitch;  thin  vowels; 

too  fast;  jumpy  inflec¬ 
tions;  effeminate  in¬ 
tonation. 

7.  SH  —  Poor  clarity;  faulty  con¬ 

sonants;  L  is  nasalized; 
R  =  W;  T  not  explosive 
enough;  glottal  stop 
used;  "these"  -  "dese", 
etc. ;  racial  (Jewish) 
type  of  phoneme  enuni- 
cation. 


12.  RS  —  Hastily;  low  depress¬ 

ing  tone;  indistinct 
elided  diction;  faulty 
pronunciation  of  stop 
consonants. 

13.  ES  —  Poor  clarity;  hesitant; 

lack  of  confidence; 
nasality;  regional 
(Maine)  rustic  pro¬ 
nunciation;  very  de¬ 
fective  reader. 

14.  CW  —  Very  poor  clarity  and 

confidence;  very 
hesitant;  nasility; 
meager  vocabulary; 
poor  reader. " 

Nevertheless  compared  with  normal 
controls,  when  both  groups  spoke  through 
low-fidelity  circuitry  against  high  level 
noise,  the  poor  talkers  were  nearly  as 
intelligible  as  the  good  talkers  (poor  had 
mn  =  56%,  S.D.  of  8.0%,  while  good  had 
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mn  =  63%,  S.  D.  of  9.8%  for  words; 
poor  had  mn  =  66%,  S.  D.  of  9.4  and 
good  had  mn  =  78%,  S.  D.  of  10.8  for 
sentence  tests).  When,  to  salve  the 
feelings  of  the  Speech  Department  they 
were  asked  to  characterize  the  least 
intelligible  talkers,  the  speech  pro¬ 
fessors  abandoned  the  terminology 
shown  above  for  Group  F  and  spoke  of 
"distinctness  and  force  in  the  utterance 
of  consonants",  "maintenance  of  a  de¬ 
liberate  rate  of  speech",  "regularity  of 
loudness-level",  "rate  of  phrasing", 
and  "a  clear  ringing  tone  as  opposed  to 
a  choked  muffled,  weak  tone".  An 
analysis  of  individual  voices  showed 
that  poor  loudness  was  a  dominant 
factor,  but  not  exclusively.  Of  those 
with  poor  intelligibility,  only  one  had  a 
loud  voice,  but  this  was  counter¬ 
balanced  by  poor  tone  quality  and  ex¬ 
tremely  inaccurate  consonant  articula¬ 
tion.  All  high- intelligibility  talkers 
had  a  loud  voice,  good  clear  tonal 
quality,  and  forceful  or  exaggerated 
consonant  articulation. 

Miller  et  al70  ,  correlated  intelligi¬ 
bility  of  47  talkers  against  four  acoustic 
indexes  of  the  voice,  and  against  judges' 
ratings  on  9  separate  criteria .  The 
most  important  feature  was  the  speech 
intensity  used,  which  is  not  helpful  here 
as  presumably  the  less  intelligible 
talkers  could  have  spoken  louder.  But 
by  using  a  multiple-correlation  tech¬ 
nique,  the  authors  found  that  a  noise- 
penetrating  quality  emerged  composed 
of  the  acoustic  cues  of  intensity,  pitch, 
and  "peakedness"  and  combined  with  the 
perceptual  cues  of  judged  strength  and 
judged  precision  of  the  consonants.  Rate 
in  words/min  was  not  important  so  long 
as  it  did  not  exceed  about  120  w/m.  On 
the  basis  of  these  data,  NSMRL  during 


World  War  II  conducted  a  telephone 
talker's  school,  giving  a  four-hour 
course  on  handling  a  mike,  speaking 
as  loud  as  possible  without  strain, 
and  articulating  distinctly. 

Black  and  Mason77  covered  sub¬ 
stantially  the  same  ground,  finding 
that  even  when  loudness  was  compar¬ 
able  the  intelligibility  of  a  sample 
of  136  unselected  and  untrained  talkers 
differed,  the  standard  deviation  of 
differences  in  intelligibility  remaining 
at  about  10%  (i.e.,  intelligibility 
scores  for  2/3  of  talkers  ranged  from 
60-80  percent  correct).  Black72  has 
summarized  his  positive  findings  on 
differences  in  talker  intelligibility: 

(1)  intelligibility  of  talker  can  be  im¬ 
proved  by  training  and/or  appropriate 
feedback;  (2)  a  talker  will  articulate 
more  clearly  in  response  to  a  better 
than  to  a  poorly  articulated  message. 

It  is  clear  that  from  the  data  of 
Abrams  and  Black  speech  power  is  the 
dominant  contributor  to  intelligibility 
in  noise,  but  that  other  features  of  a 
voice  are  also  involved.  In  one  study 
(see  figure  5.7  of  Hirsh73)  a  recorded 
list  of  PB  words  was  displayed  on  a 
graphic  level  recorder,  and  in  a  re¬ 
recording  words  which  were  above  the 
average  in  power  were  reduced  and 
those  below  increased,  so  that  all 
words  were  at  the  same  power.  Now 
when  the  re-recorded  words  were 
given  to  listening  panels,  rather  than 
homogenizing  intelligibility,  it  turned 
out  that  they  differed  more  widely  in 
intelligibility  even  than  before. 

Salmon74  selected  words  spoken  by 
four  talkers  with  the  highest  and  lowest 
intelligibility  scores  from  an  initial 
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group  of  20.  The  words  were  analyzed 
to  determine  if  the  effects  of  duration, 
intensity,  consonant-vowel  ratio  (C/V) 
and  peak-clipping  on  intelligibility.  With 
regard  to  duration,  vocalic  sections  of 
the  words  spoken  by  the  highly  intelli¬ 
gible  group  were  not  significantly  dif¬ 
ferent  from  the  vocalic  sections  of  the 
group  with  low  intelligibility.  Vowels 
and  consonants  for  the  highly  intelligi¬ 
ble  group  were  on  the  average  4.  0  and 
10.  3  dB  greater  than  for  the  low  group; 
however,  the  group  with  low  intelligi¬ 
bility  had  averaged  C/V  ratios  of  -18.9 
against  -12.5  dB  for  the  most  intelligi¬ 
ble  group.  The  results  for  peak¬ 
clipping  cannot  be  generalized;  when  the 
C/V  ratio  exceeded  15  dB,  peak¬ 
clipping  decreased  intelligibility.  Peak¬ 
clipping  which  resulted  in  a  consonant- 
vowel  difference  of  less  than  3  dB  en¬ 
hanced  intelligibility  scores. 

IX 

Goodfriend  attempted  to  measure 
proficiency  of  articulatory  gesture 
among  talkers,  and  related  articulation 
to  intelligibility.  Kelly 1 6  showed  that 
talkers  with  longer-duration  syllables 
were  more  intelligible  in  noise. 

Williams  et  al17  suggested  after  study¬ 
ing  eight  talkers,  that  intelligibility  in 
noise  was  related  to  individual  consonant- 
vowel  amplitude  ratio.  Griffiths  et  al18 
corroborated  this  notion  for  two  talkers, 
and  added  that  the  more  intelligible  talker 
had  more  high-frequency  energy  in  con¬ 
sequence  of  the  relatively  more  power¬ 
ful  consonants  in  that  voice. 

Voice  Recognition  Studies.  Communi¬ 
cations  engineers  have  often  expressed 
the  need  for  speech  analyses  which 
identify  the  phonemic  invariances  needed 
for  satisfactory  intelligibility,  while 
discarding  all  that  which  is  redundant. 
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Stevens  ,  for  example,  showed  how 
the  short  time  autocorrelation  function 
can  be  used  in  speech  analysis,  parti¬ 
cularly  for  fricatives.  Potter  and 
Steinberg2^  examined  with  the  sound 
spectrograph  and  the  cathode-ray 
sound  spectroscope  the  information¬ 
bearing  aspects  of  vowels  from  76 
talkers,  including  women  and  children, 
representing  a  deliberate  attempt  to 
vary  pitch,  inflection,  rate  of  utter¬ 
ance  and  vocal  cavity  dimensions.  A 
large  literature  has  now  grown  up  on 
the  automatic  recognition  of  a  particu¬ 
lar  talker's  voice  (see  Kersta2^  and 
Pruzansky22).  It  is  these  advanced 
techniques,  the  modern  equivalents  of 
the  cruder  acoustic  analysis  of  Abrams 
and  his  colleagues  7'10  which  will  one 
day  identify  those  characteristics  of  a 
voice  which  make  it  especially  intelli¬ 
gible  in  this  or  that  condition  of  dis¬ 
tortion. 

An  interesting  experiment  was  per¬ 
formed  by  Peters23  who  determined 
the  degree  of  each  of  five  types  of 
distortion  electronically  imposed  upon 
the  voice  which  led  to  a  deterioration  of 
a  listener's  ability  to  match  a  voice  to 
a  given  sample.  Of  course,  a  voice 
might  be  quite  distinctive  (say  Louis 
Armstrong  or  Mae  West)  and  identifi¬ 
able  under  great  distortion,  but  be 
quite  unintelligible  under  even  slight 
distortion.  However,  much  voice 
identification  research  is  likely  to 
apply  also  to  intelligibility. 

A  special  problem  arises  in  con¬ 
nection  with  changes  in  a  voice  depend¬ 
ing  on  the  emotional  state  of  the  talker, 
on  his  physiological  condition  from  day 
to  day  (state  of  larynx  and  nasal 
cavities,  or  on  the  type  of  noise  in  the 


environment).  Of  course,  speech 
sounds  have  acoustic  attributes  inde¬ 
pendent  of  these  changes  (and  even 
common  across  all  talkers)  or  the 
listener  could  not  decode  the  utterance 
into  linguistic  units  intended  by  the 
speaker;  and  we  are  of  course  not  con¬ 
cerned  with  those  acoustic  or  perceptual 
qualities  of  the  talker's  voice  which  are 
free  to  be  actualized  differently  by  a 
talker  on  different  occasions  (recall 
Chaucer:  "Somewhat  he  lipsed  fro  his 
wantonness e").  We  are  concerned  only 
with  those  acoustic  perceptual  constants 
which  contribute  to  individual  differ- 
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ences  in  intelligibility  in  noise.  Stevens 
speculates  on  the  acoustic  attributes 
which  appear  to  be  effective  in  distin¬ 
guishing  one  talker  from  another  (and 
inferentially  to  intelligibility  differences 
in  noise):  average  fundamental  fre¬ 
quency  (length  and  mass  of  vocal  cords); 
formant  frequencies  (vocal  tract  length); 
spectra  of  nasal  consonants  (shape  of 
nasal  cavities);  spectrum  shape  of  the 
vowel  /I /  in  the  range  of  the  second, 
third,  and  fourth  formants  (dimensions 
of  oral  cavity  in  relation  to  length  of 
pharyngeal  cavity);  and  spectrum  and  in¬ 
tensity  of  the  strident  fricative  /s/  and 
/ s/  (configuration  of  hard  palate  and 
teeth);  plus  learned  articulatory  habits. 
These  are  only  some  of  the  acoustic 
features  of  a  voice  which  may  contri¬ 
bute  to  its  intelligibility  in  noise. 

In  addition  to  acoustic  analysis  of 
voices,  the  ear  itself  can  be  trusted  to 
assign  descriptive  labels  to  a  particular 
voice.  From  a  total  of  49  bipolar  verbal 
scales  (clear-hazy,  rough-smooth, 
rumbling-shiny,  fast-slow,  etc.), 
Voiers25  winnowed  out  4  significant 
vocal  features,  using  a  factor-analytic 
technique.  These  features  of  (1)  clarity, 


(2)  roughness,  (3)  magnitude,  and  (4) 
animations  were  felt  by  Voiers  to  ex¬ 
haust  the  labels  to  be  put  on  a  voice.  It 
would  be  important  to  know  how  any  one 
or  all  of  these  four  perceptual  attributes 
is  related  to  intelligibility  of  a  particu¬ 
lar  voice  in  noise.  If  trained  listeners 
could  accurately  characterize  a  voice 
on  one  or  more  significant  perceptual 
variables,  a  tool  would  be  ready  at 
hand  to  select  or  reject  voices  for  cer¬ 
tain  duties  or  to  design  and  later  assess 
the  effect  of  certain  voice-training  pro¬ 
cedures. 

n/: 

Holmgren  showed  the  relation  be¬ 
tween  certain  acoustic  measurements 
of  a  voice  (rate  of  speaking,  mean 
variance  of  amplitude  of  unvoiced 
sounds,  amplitude  of  voiced  sounds, 
fundamental  frequency)  and  judgments 
on  Voiers'  four  factors  (see  also 
Clarke,  Becker  and  Nixon2'  for  further 
use  of  Holmgren's  perceptual  scales). 

It  is  clear  that  at  the  present  time 
speech  science  has  little  notion,  even  a 
general  notion,  of  either  the  acoustic 
or  perceptual  qualities  of  a  voice  which 
render  it  especially  intelligible  in  noise; 
and  such  specific  questions  have  hardly 
even  been  asked  as,  whether  a  voice  in¬ 
telligible  in  one  noise  is  also  relatively 
intelligible  in  other  noises,  or  whether 
a  talker  can  adjust  his  output  to  be 
more  intelligible  in  any  particular  type 
of  noise. 

A  Statistical  Approach  to  Variance 
Among  Talkers.  By  now  the  instru¬ 
ments  and  techniques  for  specifying  the 
acoustic  characteristics  of  an  individual 
voice  are  in  a  high  state  of  sophistica¬ 
tion,  but  the  contribution  to  intelligi¬ 
bility  of  the  invariants  for  intelligibility 
have  not,  even  as  yet,  been  quantified. 
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In  the  meantime,  a  statistical  attack  is 
all  that  is  possible. 

Inasmuch  as  not  nearly  enough  is 
known  about  what  acoustic  cues  render  a 
voice  more  intelligible,  so  that  a 
''typical"  voice  cannot  be  chosen  in  ad¬ 
vance  for  an  intelligibility  test,  it  be¬ 
comes  necessary  to  select  voices  in  an 
effort  to  sample  the  normal  range. 

Much  earlier  work  on  creating  intelligi¬ 
bility  tests  used  one  trained  talker  with 
general  American  dialect.  It  is  now 
known  that  for  many  purposes  this  is  the 
worst  possible  approach.  Pollack20 
adopted  the  expedient  of  pretesting  a 
number  of  trained  normal  talkers  in  his 
conditions,  then  selecting  the  most  and 
the  least  intelligible  in  an  attempt  to 
strike  a  mean.  The  conclusion  of 
Fletcher  and  Steinberg2  has  been  men¬ 
tioned  that  five  voices  are  needed  to 
sample  talker  variance.  No  doubt  a 
large  number  would  be  needed  to  repre¬ 
sent  untrained  as  well  as  trained  talkers. 
Harris 2^  used  an  average  of  four  adult 
male  spoken  voices,  four  adult  female 
spoken  voices  and  four  female  whispered 
voices  in  one  intelligibility  study,  and 
later"*0  used  five  male  and  five  female 
talkers,  all  normal  but  with  wide  ranges 
of  age  and  vocal  gesture,  within  a  single 
50-word  predictability  test.  However, 
talker  differences  as  such  were  not  re¬ 
ported  by  either  Pollack  or  Harris. 

This  paper  reports  inter-talker 
variability  of  intelligibility  in  noise  at 
S/N  =  5  dB  for  ten  young  adult  talkers, 
five  men  and  five  women. 


METHOD 

Subjects 

Talkers.  Five  young  men  and  five 
young  women  with  no  obvious  speech 
defects  or  strong  dialects  were  used. 

Listeners.  Twenty  normal-hearing 
college  students,  10  male  and  10  fe¬ 
male  were  used. 

Recording  the  Speech  Material 

The  Modified  Rhyme  Test  (MRT)3J 
was  selected  on  grounds  of  high  relia¬ 
bility  and  ease  of  administering  and 
scoring. 

Two  equated  50-word  lists  of  the 
MRT  were  taped  with  a  Shure  Micro¬ 
phone  and  an  Ampex  PR-10  recorder. 
Each  talker  enunciated  five  words  on 
each  list,  the  order  of  talkers  in  the 
second  list  differing  from  that  in  the 
first.  Male  and  female  talkers  were 
interspersed. 

Each  talker  was  given  a  practice 
period  prior  to  recording.  The  carrier 
phrase  "Hear  the  word"  prefaced  each 
stimulus  word,  and  a  3 -sec  pause  fol¬ 
lowed  the  stimulus  word.  The  talker 
attempted  to  maintain  a  constant  level 
of  vocal  output  by  watching  a  VU-meter. 
A  calibrating  tone  at  VU=0  was  placed 
on  each  talker's  tape. 

The  tapes  from  each  talker  were 
then  played  one  by  one  to  a  General 
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Radio  Graphic  Level  Recorder,  and  the  scores  obtained.  White  noise  was 

average  speech  power  for  an  individual  mixed  on  the  final  tape  at  a  -5  dB 

talker  was  noted  and  compared  to  the  speech-to-noise  ratio  in  order  to 
mean  for  all  10  talkers.  In  a  re-  approximate  60%  correct  responses  by 

recording,  each  talker’s  tape  was  ad-  listening  panels  (see  Sergeant  and 
justed  slightly  (using  a  calibrating  tone)  McKay^2  for  discussion  of  S/N  ratios 
so  that  the  average  speech  power  was  and  speech  intelligibility), 
the  same  for  all  talkers.  Table  1  shows 
by  how  much  each  talker's  tape  was  ad¬ 
justed.  These  adjustments  were  made  RESULTS  AND  DISCUSSION 

to  eliminate  effects  of  overall  intensity 

and  comfortable  talking  level  as  factors  The  mean  correct  scores  for  the  two 

which  would  influence  the  intelligibility  groups  of  talkers  and  the  two  groups  of 

listeners,  by  list,  are  shown  in  Table 
2.  The  raw  scores  used  in  determin¬ 
ing  these  mean  values  were  subjected 
to  an  analysis  of  variance  (Binning  and 
Lintz,  1968)  according  to  talkers, 
listeners  and  sex.  Sex  of  the 
"Listeners"  was  insignificant  as  a 
factor  in  the  intelligibility  scores  and 
consequently  the  data  can  be  collapsed 
for  listeners.  The  test  for  list  differ¬ 
ence  was  also  insignificant  and  can  be 
similarly  collapsed.  The  interaction 
between  sex  of  the  talker  and  the  list 
(either  A  or  B)  was  statistically  sig¬ 
nificant  (F  of  8.6,  df  1/18)  and  can  be 
attributed  to  the  variation  between  the 
larger  differences  for  list  B  as  con¬ 
trasted  with  the  slight  difference  ob¬ 
served  for  list  A  (Male/Female  talker 
differences  were  14%  for  List  A  and 
4%  for  List  B). 

Analysis  of  variance  showed  a  signifi¬ 
cant  difference  between  the  variances  of 
listeners'  responses  to  the  male  vs  female 
talkers  (F of  47.0,  dfl/18).  It  is  logical  to 
assume  that  this  difference  in  variances 
was  caused  by  either  true  talker  differences 
or  errors  associated  with  insufficient  sam¬ 
pling  of  the  talkers' speech.  The  second  of 
these  was  considered  most  likely  for  the 
scores  obtained  during  this  study. 
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Table  2.  Talker  and  listener  intelligibility  in  mean  percent  correct  response, 
by  sex  and  list.  Note:  Entries  are  rounded  to  nearest  whole  number. 


LISTENER  INTELLIGIBILITY 

Both 

Sex 

List 

Male 

Female 

Sexes 

A 

60 

54 

57  I 

EH 

i— < 

Male 

B 

51 

57 

54 

pq 

A  &  B 

55 

56 

55 

3 

A 

59 

62 

61 

w 

g 

K-i 

Female 

B 

66 

70 

68 

H 

A  &  B 

62 

66 

64 

< 

A 

59 

58 

59 

Both 

Sexes 

B 

58 

63 

61 

A  &  B 

49 

51 

60 

Because  of  the  problem  in  sampling  there  is  no  question  that  the  ranges  of 
the  talkers'  speech,  a  second  evaluation  intelligibility  for  the  two  groups  of 
was  made  of  the  speech  material.  The  talkers  overlap  greatly, 
talker  scores  shown  in  Table  3  are  per¬ 
cent  correct  responses  to  each  talker’s  It  is  concluded  that  when  talkers  are 
words  (total  responses  for  each  talker  limited  to  a  vocabulary  of  ten  different 

was  10  words  x  20  listeners,  or  200).  words  each,  the  variance  of  intelligi- 

The  range  of  overall  intelligibility,  bility  among  talkers  is  not  reduced  fur- 

47.5  -  75.5%,  is  about  what  is  usually  ther  than  commonly  reported  in  the 
reported  in  the  literature.  Difference  literature,  even  when  a  procedure  is 
between  means  by  sex  was  not  signifi-  followed  which  adjusts  each  voice  for 

cant  (t=1.87).  Apparently  limited  constant  overall  peak  intensity  at  a 

sampling  does  not  permit  sufficiently  comfortable  talking  level, 
strong  response  to  the  question  of  dif¬ 
ference  in  the  intelligibility  of  speech  It  is  obvious  from  the  literature,  and 

of  men  and  women.  On  the  other  hand,  we  see  here  as  well,  that  many  features 
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Table  3.  Intelligibility  in  percent 
words  correct  for  individual  talkers 


Sex 

Talker 

Intelligibility 

Score 

FEMALE 

MK 

75.5 

SM 

65.5 

BK 

65.5 

CS 

60.5 

RG 

57.5 

MEAN 

64.9 

MALE 

TM 

67.0 

JR 

59.5 

EN 

57.0 

TK 

50.5 

PL 

47.5 

MEAN 

56.3 

of  a  voice  beyond  its  overall  loudness 
contribute  to  its  being  resistant  to  white 
noise  masking.  For  example,  the  voice 
of  MK  (Table  3)  was  about  10  percentage 
points  more  intelligible  than  the  next 
most  intelligible  voice.  Certainly  the 
intensity  of  this  voice  is  not  its  most 
distinctive  characteristic.  But  just 
which  acoustic  feature(s)  of  this  voice 
render  it  most  intelligible,  and  at  the 
other  extreme,  which  feature(s)  render 
the  voice  of  PL  most  unintelligible,  if 
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not  the  particular  words  assigned  to 
these  talkers,  has  not  as  yet  been  de¬ 
termined. 

A  practical  question  is,  how  many 
fewer  voices  would  be  needed  to  sample 
the  intelligibility  of  these  10  voices.  A 
random  selection  of  triads  shows  that 
any  3  voices  would  yield  an  average 
score  within  the  range  of  the  true  mean 
(obtained  mean  +  3  S.E.),  but  that  5 
voices  are  needed  to  specify  a  mean 
within  +  1  S.Ei  of  the  obtained  mean 
for  all  10  talkers.  Especially  where 
the  same  words  may  be  enunciated  by 
all,  a  maximum  of  5  voices  should 
adequately  sample  the  average  inteli- 
gibility  in  noise  of  talkers  with  clear, 
unaccented,  dialect-free  speech. 

SUMMARY 

Five  young  men  and  five  young  women 
each  tape-recorded  10  words  at  a  com¬ 
fortable  talking  level  from  two  lists  of 
the  Modified  Rhyme  Test.  The  output 
of  each  word  was  examined  with  a  graphic 
level  recorder,  and  in  a  re-recording 
the  mean  level  of  each  voice  was  ad¬ 
justed  to  the  average  for  all  voices.  In¬ 
telligibility  tests  in  the  presence  of 
background  noise  (S/N  was  -5  dB)  were 
conducted  by  loudspeaker  simultaneously 
to  10  young  men  and  10  young  women. 
There  were  no  differences  due  to  sex  of 
listeners,  nor  to  word  list.  Mean  dif¬ 
ferences  due  to  sex  of  talkers  were  not 
significant  by  t-test. 

Range  of  intelligibility  among  talkers 
in  percent  of  words  correctly  perceived 
was  47.5  -  75.5,  even  though  all  voices 
had  been  equated  for  overall  intensity. 
Most  of  this  range  could,  therefore,  not 


be  attributed  to  voice  intensity  or  com¬ 
fortable  speaking  level,  but  to  some 
other  specific  characteristic (s)  of  the 
talker’s  voice.  Since  no  two  talkers 
enunciated  the  same  words,  differences 
in  intelligibility  residing  in  the  words 
themselves  perturb  these  data.  It  was 
found  that  no  less  than  any  five  of  these 
talkers  would  be  required  to  specify  the 
true  mean  of  all  ten  voices  within  one 
standard  error,  even  though  all  voices 
were  of  clear  enunciation,  unaccented, 
and  dialect-free. 
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It  may  not  be  good  practice  to  depend  upon  a  single  voice  In  assessing  the  perfor¬ 
mance  of  a  communication  system.  Even  when  each  of  ten  talkers  with  dialect-free 
speech,  who  spoke  at  a  comfortable  level,  were  equated  for  acoustic  Intensity,  one 
talker  yielded  a  performance  of  75.5%  words  correct,  another  only  47.5%.  It  would 
be  necessary  to  take  the  average  Intelligibility  of  any  five  of  these  talkers 
properly  to  assess  a  communication  system.  A  great  deal  of  research  needs  to  be 
done  In  studying  the  acoustic  and  perceptual  cues  which  render  a  particular  voice 
highly  Intel  1 Ig lbf le  In  noise. 


DD  ,F°oRv„1473  <page  '> 

S/N  O  102-014-  6600 


UNCLASSIFIED 

Security  Classification 


I 


UNCLASSIFIED 


Security  Classification 


Speech  Intel! Iglbll fty 
Voice  Communications 
Talker  Differences 
Talker  Identification 
Speech  In  Noise 


DD  ,'”“.,1473  i back i 

{PAGE  2) 


UNCLASSIFIED 

Security  Classification 


